-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Quant] Molmo SupportsQuant #13336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant] Molmo SupportsQuant #13336
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
|
||
| class MolmoVisionBackbone(nn.Module): | ||
| class MolmoVisionBackbone(nn.Module, SupportsQuant): | ||
| packed_modules_mapping = {"merged_linear": ["gate_proj", "up_proj"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this from L1440 now that it's defined in the inner model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that that line is still required by SupportsLoRA
Lines 342 to 343 in ce77eb9
| self.packed_modules_mapping = copy.deepcopy( | |
| self.model.packed_modules_mapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, the packed_modules_mapping attribute should be defined at the lowest level possible, ie on every Module with packed modules.
I think in practice we should add the SupportsQuant and packed_modules_mapping to all classes which define a load_weights method, as it seems like these are the classes which are composable with other models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation!
I think in practice we should add the
SupportsQuantandpacked_modules_mappingto all classes which define aload_weightsmethod, as it seems like these are the classes which are composable with other models
Sounds good to me!
Signed-off-by: Louis Ulmer <[email protected]>
Purpose
SupportsQuantinterface to Molo models whichpacked_modules_mappingconfigure_quant_configfunction after it has been added to a sufficient number of modelsconfigure_quant_configassumes that allpacked_modules_mappings will be declared prior to initialization. In reality, some models are submodels of each other, so their mappings can only be determined at init time.Prerequisites
SupportsQuantto phi3 and clip #13104Changes
SupportsQuantinterface toMolmoVisionBackbone,MolmoModel, andMolmoForCausalLM